Search CORE

17 research outputs found

Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Author: Mou Lichao
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue
Publication date: 07/04/2023
Field of study

Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.Comment: This paper was submitted to the JURSE 2023 conference on November 5, 202

arXiv.org e-Print Archive

Change Detection Meets Visual Question Answering

Author: Mou Lichao
Xiong Zhitong
Yuan Zhenghang
Zhu Xiaoxiang
Publication venue
Publication date: 12/12/2021
Field of study

The Earth's surface is continually changing, and identifying changes plays an important role in urban planning and sustainability. Although change detection techniques have been successfully developed for many years, these techniques are still limited to experts and facilitators in related fields. In order to provide every user with flexible access to change information and help them better understand land-cover changes, we introduce a novel task: change detection-based visual question answering (CDVQA) on multi-temporal aerial images. In particular, multi-temporal images can be queried to obtain high level change-based information according to content changes between two input images. We first build a CDVQA dataset including multi-temporal image-question-answer triplets using an automatic question-answer generation method. Then, a baseline CDVQA framework is devised in this work, and it contains four parts: multi-temporal feature encoding, multi-temporal fusion, multi-modal fusion, and answer prediction. In addition, we also introduce a change enhancing module to multi-temporal feature encoding, aiming at incorporating more change-related information. Finally, effects of different backbones and multi-temporal fusion strategies are studied on the performance of CDVQA task. The experimental results provide useful insights for developing better CDVQA models, which are important for future research on this task. We will make our dataset and code publicly available

arXiv.org e-Print Archive

Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training

Author: Mou Lichao
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue
Publication date: 01/06/2023
Field of study

The Visual Question Answering (VQA) system offers a user-friendly interface and enables human-computer interaction. However, VQA models commonly face the challenge of language bias, resulting from the learned superficial correlation between questions and answers. To address this issue, in this study, we present a novel framework to reduce the language bias of the VQA for remote sensing data (RSVQA). Specifically, we add an adversarial branch to the original VQA framework. Based on the adversarial branch, we introduce two regularizers to constrain the training process against language bias. Furthermore, to evaluate the performance in terms of language bias, we propose a new metric that combines standard accuracy with the performance drop when incorporating question and random image information. Experimental results demonstrate the effectiveness of our method. We believe that our method can shed light on future work for reducing language bias on the RSVQA task

arXiv.org e-Print Archive

RRSIS: Referring Remote Sensing Image Segmentation

Author: Hua Yuansheng
Mou Lichao
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue
Publication date: 01/03/2024
Field of study

Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this paper, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we create a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multi-scale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. We will make our dataset and code publicly available

arXiv.org e-Print Archive

GETNET: A General End-to-end Two-dimensional CNN Framework for Hyperspectral Image Change Detection

Author: Du Qian
Fellow
Fellow
IEEE
IEEE
IEEE
Li Xuelong
Member Senior
Wang Qi
Yuan Zhenghang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/05/2019
Field of study

Change detection (CD) is an important application of remote sensing, which provides timely change information about large-scale Earth surface. With the emergence of hyperspectral imagery, CD technology has been greatly promoted, as hyperspectral data with the highspectral resolution are capable of detecting finer changes than using the traditional multispectral imagery. Nevertheless, the high dimension of hyperspectral data makes it difficult to implement traditional CD algorithms. Besides, endmember abundance information at subpixel level is often not fully utilized. In order to better handle high dimension problem and explore abundance information, this paper presents a General End-to-end Two-dimensional CNN (GETNET) framework for hyperspectral image change detection (HSI-CD). The main contributions of this work are threefold: 1) Mixed-affinity matrix that integrates subpixel representation is introduced to mine more cross-channel gradient features and fuse multi-source information; 2) 2-D CNN is designed to learn the discriminative features effectively from multi-source data at a higher level and enhance the generalization ability of the proposed CD algorithm; 3) A new HSI-CD data set is designed for the objective comparison of different methods. Experimental results on real hyperspectral data sets demonstrate the proposed method outperforms most of the state-of-the-arts

arXiv.org e-Print Archive

Self-Paced Curriculum Learning for Visual Question Answering on Remote Sensing Data

Author: Mou LiChao
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2021
Field of study

Answering questions with natural language by extracting in-formation from image has great potential in various applica-tions. Although visual question answering (VQA) for naturalimage has been broadly studied, VQA for remote sensing datais still in the early research stage. For the same remote sens-ing image, there exist questions with dramatically differentdifficulty-levels. Treating these questions equally may mis-lead the model and limit the VQA model performance. Con-sidering this problem, in this work, we propose a self-pacedcurriculum learning (SPCL) based VQA model with hard andsoft weighting strategies for remote sensing data. Like humanlearning process, the model is trained from easy to hard ques-tion samples gradually. Extensive experimental results on twodatasets demonstrate that the proposed training method canachieve promising performance

Institute of Transport Research:Publications

Change-Aware Visual Question Answering

Author: Mou LiChao
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue
Publication date: 22/06/2022
Field of study

Institute of Transport Research:Publications

From Easy to Hard: Learning Language-Guided Curriculum for Visual Question Answering on Remote Sensing Data

Author: Mou LiChao
Wang Qi
Yuan Zhenghang
Zhu Xiao Xiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2022
Field of study

Visual question answering (VQA) for remote sensing scene has great potential in intelligent human–computer interaction system. Although VQA in computer vision has been widely researched, VQA for remote sensing data (RSVQA) is still in its infancy. There are two characteristics that need to be specially considered for the RSVQA task: 1) no object annotations are available in the RSVQA datasets, which makes it difficult for models to exploit informative region representation and 2) there are questions with clearly different difficulty levels for each image in the RSVQA task. Directly training a model with questions in a random order may confuse the model and limit the performance. To address these two problems, in this article, a multi-level visual feature learning method is proposed to jointly extract language-guided holistic and regional image features. Besides, a self-paced curriculum learning (SPCL)-based VQA model is developed to train networks with samples in an easy-to-hard way. To be more specific, a language-guided SPCL method with a soft weighting strategy is explored in this work. The proposed model is evaluated on three public datasets, and extensive experimental results show that the proposed RSVQA framework can achieve promising performance. Code will be available at https://gitlab.lrz.de/ai4eo/reasoning/VQA-easy2har

Institute of Transport Research:Publications